Methods and structure for rapid background initialization of a RAID logical unit

ABSTRACT

Methods and structure for initializing a RAID storage volume substantially in parallel with processing of host generated I/O requests. Initialization of a RAID volume may be performed as a background task in one aspect of the invention while host generated I/O requests proceed in parallel with the initialization. The initialization may preferably the performed by zeroing all data including parity for each stripe to thereby make each stripe XOR consistent. Host generated I/O requests to write information on the volume may utilize standard read-modify-write requests where the entire I/O request affects information in a portion of the volume already initialized by background processing. Other host I/O requests use standard techniques for generating parity for all stripes affected by the write requests. These and other features and aspects of the present invention make a newly defined RAID volume available for host processing is quickly as possible.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] The invention relates generally to storage subsystems and morespecifically relates to techniques for initializing a new logical unitas a background process substantially overlapped with host I/O requestprocessing on the new logical unit.

[0003] 2. Related Patents

[0004] This patent is related to co-pending, commonly owned U.S. patentapplication Ser. No. 10/210,384 (internal docket number 02-0403), filedAug. 1, 2002, entitled METHOD AND APPARATUS FOR COPYING DATA BETWEENSTORAGE VOLUMES OF STORAGE SYSTEMS, incorporated herein by reference(hereinafter the '384 application). This patent is also related tocommonly owned U.S. Pat. No. 6,467,023, issued Oct. 15, 2002, entitledMETHOD FOR LOGICAL UNIT CREATION WITH IMMEDIATE AVAILABILITY IN A RAIDSTORAGE ENVIRONMENT, incorporated herein by reference. This patent isalso related to co-pending, commonly owned U.S. patent application Ser.No. 02-6292, filed Apr. 28, 2003, entitled METHODS AND STRUCTURE FORIMPROVED FAULT TOLERANCE DURING INITIALIZATION OF A RAID LOGICAL UNIT,incorporated herein by reference and also referred to here in as the“sibling” patent application.

[0005] 3. Discussion of Related Art

[0006] As complexity of computing applications has evolved so to havedemands for reliability and speed in associated storage subsystems. Ingeneral, computing storage subsystems are used for storage and retrievalof programs and data associated with operation of various programs. Themission critical nature of some applications has led to correspondingdemands for increased reliability in storage subsystems. Further,high-performance storage related applications, such as multimedia datacapture and replay, have contributed to increased demands forperformance on storage subsystems.

[0007] RAID storage management techniques (Redundant Array ofIndependent Disks) have been employed for some time to enhance bothperformance and reliability in such high-performance, high reliabilitystorage applications. Striping techniques applied within RAID storagemanagement distribute stored data over multiple independent disk drivesthereby enhancing storage performance by distributing storage andretrieval operations over a plurality of disk drives operable inparallel. Redundancy techniques employed within RAID storage subsystemsenhance reliability of the storage subsystem by generating andmaintaining redundancy information associated with the user supplieddata. The redundancy information ensures that failure of any single diskdrive does not risk loss of data and, in some cases, allows the RAIDstorage subsystem to continue operation (though often in a degradedmode).

[0008] RAID storage management encompasses a number of storagemanagement techniques for distributing data (striping) and forgenerating, maintaining, and distributing redundancy information over aplurality of drives. Each of these RAID management techniques istypically referred to as a “level” such as RAID level 0, RAID level 1,RAID level 5, etc. One common RAID storage management technique, oftenreferred to as RAID level 5, distributes user data over a plurality ofdrives and associates therewith an additional portion of data(redundancy information) generated by use of XOR parity operations. Astripe of data consists of distributed portions of user data and theassociated redundancy information. A volume or logical unit (LUN)comprises a plurality of such stripes distributed over a subset of diskdrives in the storage subsystem.

[0009] Typically a RAID controller, often integrated within the storagesubsystem, applies RAID storage management techniques to store andretrieve such stripes on the disk drives of the storage subsystem. TheRAID storage controller hides from the host systems information relatingto the specific locations of individual portions of data and hidesinformation regarding generation and maintenance of the requiredredundancy information. To an attached host computing system, the RAIDstorage controller makes a volume or logical unit appear essentially asa single, highly reliable, high-performance, high-capacity disk drive.In general, the RAID storage controller provides a mapping function fromlogical addresses or locations specified in host I/O requests tophysical storage locations corresponding to the striped informationdistributed over multiple disks.

[0010] In RAID level 5 storage management (as well as other forms ofRAID storage management such as RAID levels 6) it is important that anewly defined storage volume be made “XOR consistent”. XOR consistencyas used herein refers to the state of each stripe such that the data inthe stripe and the associated redundancy information areconsistent—i.e., the parity information corresponds to the associateddata of the stripe. While RAID level 5 uses XOR parity, “XOR consistent”as used herein refers to any redundancy information stored in the arrayof disk drives that make up a volume. For example, XOR consistent mayalso refer to the redundancy information used in RAID level 6 and themirrored redundancy information used in RAID level 1. Therefore,although the problems presented herein are discussed in detail withrespect to RAID level 5, similar problems are encountered in other RAIDmanagement levels where a new volume must be initialized to make theredundant information ready for use.

[0011] When a new volume is defined by allocating portions of storagecapacity distributed over a plurality of drives, the volume mayinitially include random data leftover from previous utilization of thedisk drives or generated from some other source. In general, the initialinformation on a newly defined storage volume will not be XORconsistent.

[0012] A common technique applied to assure XOR consistency in a newlydefined RAID level 5 volume is to read every block of data in a stripe,compute a corresponding XOR parity block, and write the newly computedXOR parity block to the appropriate location for the redundancyinformation of the stripe. Though data of the stripe may be meaningless(i.e., leftover) the stripe will then be XOR consistent. Such a processcan be very time-consuming where the newly defined RAID level 5 volumeis particularly large. Typically during such an initialization process,the RAID storage controller makes the volume (logical unit) unavailablefor storage or retrieval of information by an attached host system.Frequently, such an extended delay in availability of the new volume isunacceptable.

[0013] Another prior technique described in the '023 patent allows I/Orequests to be processed during initialization of a newly defined RAIDlevel 5 volume. As the initialization process proceeds, a thresholdindicator tracks its progress. If a host I/O request uses stripes thatfall below the progress threshold, the request is handled normally. I/Orequests are requeued for later processing by the controller if any partof the request involves uninitialized portions of the new volume. Thisprior technique allows the logical unit to be referenced by an attachedhost systems but defers the actual I/O operations until appropriateportions of the newly defined logical unit have been initialized (i.e.,made XOR consistent).

[0014] A similar problem is addressed in the '384 application where alogical unit is migrated to another logical unit. The methods andstructures discussed therein permit I/O operations to proceedsubstantially in parallel with the data copying operation.

[0015] It is evident from the above discussion that an ongoing problemexists in making a newly defined RAID volume available for host I/Orequest processing as soon as possible.

SUMMARY OF THE INVENTION

[0016] The present invention solves the above and other problems,thereby advancing the state of the useful arts, by providing methods andstructure for performing initialization of a newly defined RAID logicalunit as a background processing task and for substantiallysimultaneously processing host system I/O requests during theinitialization process. More specifically, background initializationprocessing makes stripes on the newly defined RAID volume XOR consistentby writing all zeros to data blocks associated with the newly definedvolume. A data structure associated with the methods of the presentinvention provides information regarding progress of the backgroundinitialization process. Host I/O requests to write information in aportion of the newly defined volume for which initialization hascompleted may be performed using standard read-modify-write (RMW)operations to affected stripes. Host generated I/O requests to writeinformation in portions of the newly defined volume for whichinitialization has not completed are performed by generating parity forthe entire stripe affected by the I/O request and writing the generatedparity along with the affected data thereby making the affected stripeXOR consistent. The latter approach is often referred to as aread-peer-write process in that it reads the peers of the affected datablocks (the other blocks of the stripe) and determines the new parityinformation to be written back. Any write operations performed (eitherusing read-modify-write or read-peer-write) record progress informationused by the background initialization process to identify stripes thatmay not be cleared by the zeroing of blocks.

[0017] A feature of the invention therefore provides method forinitializing a storage volume comprising: making the volume XORconsistent; and processing an I/O request received from an attached hostsystem substantially in parallel with the step of making the volume XORconsistent.

[0018] Another aspect of the invention further provides for storingfirst progress information regarding the step of making the volume XORconsistent in a non-volatile memory.

[0019] Another aspect of the invention provides that the step ofprocessing further comprises: determining if the I/O request affectsonly portions of the volume already made XOR consistent; performing theI/O request using read-modify-write processing if the I/O requestaffects only portions of the volume already made XOR consistent; andperforming the I/O request using read-peer-write processing if the I/Orequest affects any portion of the volume not already made XORconsistent.

[0020] Another aspect of the invention further provides that the step ofperforming read-peer-write processing further comprises: storing secondprogress information in a memory indicating portions of the volume madeXOR consistent by the read-peer-write processing.

[0021] Another aspect of the invention further provides that the step ofmaking further comprises: determining from the first progressinformation and from the second progress information whether a portionof the volume to be initialized has been modified by read-peer-writeprocessing; making the portion XOR consistent using write-same logic toclear the portion if the portion has not been modified byread-peer-write processing; making the portion XOR consistent usingwrite-parity logic to compute parity if the portion has been modified byread-peer-write processing; and updating the first progress indicator toindicate completion of making the portion XOR consistent.

[0022] Another aspect of the invention further provides for dividing thevolume into a plurality of zones, wherein the step of storing firstprogress information further comprises: storing threshold indiciaindicative of the progress of the step of making the volume XORconsistent wherein zones below the threshold indicia have been made XORconsistent and wherein zones above the threshold indicia are not yet XORconsistent, and wherein the step of storing second progress informationfurther comprises: storing a zone bitmap structure in the non-volatilememory wherein each bit of the zone bitmap structure corresponds to azone of the plurality of zones and wherein each bit of the zone bitmapstructure indicates whether processing of an I/O request has modifiedany portion of the corresponding zone of the volume.

[0023] Another aspect of the invention further provides for dividingeach zone into a plurality of sub-zones, wherein the step of storingsecond progress information further comprises: storing a sub-zone bitmapstructure in a memory wherein each bit of the sub-zone bitmap structurecorresponds to a sub-zone of the plurality of sub-zones and wherein eachbit of the sub-zone bitmap structure indicates whether processing of anI/O request has modified any portion of the corresponding sub-zone ofthe volume.

[0024] Another aspect of the invention further provides that the step ofmaking further comprises: determining from the first progressinformation and from the second progress information whether a portionof the volume to be initialized has been modified by read-peer-writeprocessing; making the portion XOR consistent using write-same logic toclear the portion if the portion has not been modified byread-peer-write processing; making the portion XOR consistent usingwrite-parity logic to compute parity if the portion has been modified byread-peer-write processing; and updating the first progress indicator toindicate completion of making the portion XOR consistent.

[0025] Another aspect of the invention further provides for pausing thestep of making the volume XOR consistent; saving the first progressinformation and the second progress information on disk drives of thevolume; restoring the saved first progress information and the savedsecond progress information from disk drives of the volume; and resumingthe step of making the volume XOR consistent in accordance with therestored first progress information and the restored second progressinformation.

[0026] Another feature of the invention provides a storage systemcomprising: a plurality of disk drives; and a storage controller coupledto the plurality of disk drives and adapted to receive and process I/Orequests from an attached host system, wherein the storage controller isadapted to define a volume comprising a portion of each of one or moredisk drives of the plurality of disk drives, and wherein the storagecontroller is adapted to make the volume XOR consistent substantially inparallel with processing of I/O requests from the attached host system.

BRIEF DESCRIPTION OF THE DRAWINGS

[0027]FIG. 1 is a block diagram of an exemplary storage system embodyingfeatures and aspects of the invention.

[0028]FIG. 2 is a flowchart describing a read-modify-write process.

[0029]FIG. 3 is a flowchart describing a read-peer-write process.

[0030]FIG. 4 is a flowchart of a storage volume initialization process.

[0031]FIG. 5 is a flowchart of an I/O request processing method.

[0032]FIG. 6 is a flowchart of a process for background initializationof a volume.

[0033]FIG. 7 is a flowchart of a process to make a portion of a volumeXOR consistent.

[0034]FIG. 8 is a flowchart of a process to write a portion of a volumeto force it to be XOR consistent.

[0035]FIG. 9 is a block diagram depicting structures useful to indicateprogress of background initialization processing.

DETAILED DESCRIPTION OF THE DRAWINGS

[0036]FIG. 1 is a block diagram of a storage system 100 advantageouslyapplying features and aspects of the present invention to improveinitialization of a storage volume within the storage system. Storagesystem 100 may include a plurality of disk drives 122 through 130.Storage controller 102 manages interaction with the disk drives toachieve desired reliability, performance and host interaction. Volume120 may be defined by such a storage controller 102 as a logicalpartition of the available storage space defined by disks 122 through130. In general, such a volume 120 comprises a portion of the capacityof one or more of the available disk drives in the storage system. Theportion may comprise any fraction of the available capacity of each diskdrive up to and including the entirety of each disk drive. As shown inFIG. 1, exemplary volume 120 comprises a portion of each of disk drives126, 128 and 130.

[0037] Storage controller 102 may include a CPU 108 for controllingoperation of the controller and overall management of the storagesystem. CPU 108 may be coupled to DRAM 106 for storage and retrieval ofinformation. Information stored in such DRAM may include programinstructions, cache data buffers, and initialization progressinformation 150 (as discussed further herein below). CPU 108 may alsostore information in nonvolatile memory (NVRAM) 104. Examples ofinformation stored in such a nonvolatile memory may include,configuration information and initialization progress information 152(as discussed further herein below).

[0038] CPU 108 may also be coupled to attached host computing systems(not shown) via host interface 112. Still further, CPU 108 may becoupled to disk drives 122 through 130 via device interface 110. Hostinterface 112 couples the storage controller 102 to any of a number ofwell-known, commercially available or customized communication media forexchanging information with attached host systems. Exemplary of suchcommunication media are Fibre Channel, parallel SCSI, etc. Stillfurther, storage controller 102 is coupled through device interface 110to disk drives 122 through 130 via any of several well-known,commercially available or customized communication media. Examples ofsuch communication media for exchange of information between storagecontroller 102 and disk drives 122 through 130 are Fibre Channel,parallel SCSI, etc. Those of ordinary skill in the art will readilyrecognize numerous equivalent communication media and protocols forcoupling a storage subsystem storage controller 102 to attached hostsystems and to disk drives within the storage subsystem. Further, thoseof ordinary skill in the art will readily recognize numerous equivalentstructures for the design of storage controller 102. Additionalcomponents may be resident within such a storage controller and may beinterconnected using any of several structures as well-known in the art.In addition, those of ordinary skill in the art will recognize that inhigh reliability and/or high-performance storage system applications, itmay be common to utilize multiple storage controller devices forimproved reliability through redundancy, for improved performancethrough parallel operation, or both. Is also common that redundantcommunication media paths may be employed to improve reliability andperformance. FIG. 1 is therefore merely intended as exemplary of onepossible storage system configuration in which features and aspects ofthe present invention may be beneficially applied.

[0039] In FIG. 1 it may be noted that initialization progressinformation may be stored in the various memory devices associated withthe storage subsystem. Initialization progress information 150 may begenerated and maintained within DRAM 106, progress information 152 maybe maintained within nonvolatile memory 104 and progress information 154may be maintained on a portion of volume 120. Features and aspects ofthe present invention generally provide for improved operations forinitialization of volume 120. As noted above, in many storage systemapplications, a newly defined volume 120 must be initialized. Inparticular, in RAID storage management subsystems, a RAID volume must beinitialized to make it XOR consistent. As noted above, XOR consistent asused herein refers to a state of the volume wherein redundancyinformation associated with a RAID management of the volume isconsistent with the associated data of the volume. For example, in RAIDlevel 5 storage management, XOR parity information associated with dataof the volume must be made consistent with the data stored on a newlydefined volume. Similarly, RAID level 6 storage management utilizesmultiple forms of such redundant information and must be made consistentfollowing definition of a new volume. Still further, RAID level 1provides for complete mirroring of data and therefore must be madeconsistent in that a full mirror must be generated from the existingdata of the volume. “XOR consistent” therefore means more generallyredundant information is consistent with its associated data in a newlydefined volume.

[0040] Storage controller 102 of FIG. 1 is generally operable to performsuch initialization of a newly defined volume 120. Various structuresgenerated and maintained within any, or all of progress information 150,152 and 154 are used to track progress of the initialization of a newlydefined volume 120. The progress information is used to allowoverlapped, parallel operations of both I/O request processing andinitialization of a newly defined volume. Whereas in the past,initialization of a volume could impose significant delays on processingof any I/O requests, features and aspects of the present inventionutilize initialization progress information and shown in FIG. 1 tocoordinate such overlap of I/O request processing and volumeinitialization.

[0041]FIG. 4 is a flowchart describing a method of the present inventionfor initializing a newly defined volume (LUN) as a background processwhile allowing processing of I/O requests in parallel therewith. Element400 is first operable to reset progress information regarding theinitialization process. As discussed further herein below, progressinformation may include a threshold or watermark indicator indicative ofthe progress of background initialization. In addition, progressinformation reset by element 400 may include data structures indicativeof I/O request operations performed in parallel with the backgroundinitialization process. All such structures are reset as appropriate byoperation of element 400. Element 402 is then operable to set stateinformation indicating that the volume or LUN being initialized is notpresently initialized. This state information may be used as discussedfurther herein below to adapt processing of I/O requests as required foran initialized vs. uninitialized volume. Element 404 is operable tocommence background processing of the initialization task for the newlydefined volume. Initialization of the newly defined volume proceeds as abackground task in parallel with processing of I/O requests fromattached host systems. Background volume initialization and I/O requestprocessing are coordinated through updates to the progress informationinitially reset by operation of element 400.

[0042] As noted above, progress information may include a threshold orwatermark indicator of the progress of initialization process. Further,progress information may include data structures indicative of I/Orequest processing performed substantially in parallel with theinitialization process. The initialization process and I/O requestprocessing features cooperate and coordinate through such sharedprogress information to allow both processes to proceed substantially inparallel. FIG. 9 shows exemplary structure representative of suchprogress information.

[0043] Progress information 900 may include a first progress indicatoror first progress information as watermark indicator 904. Watermarkindicator 904 may be updated periodically as background initializationprocessing proceeds. Watermark indicator 904 may be updated as eachstripe is initialized in the volume or may be updated less frequently asgroups of stripes are initialized. Watermark indicator 904 may be storedin any suitable memory structure including, for example, a nonvolatilememory component associated with the storage controller or, for example,a reserved progress information area within the volume beinginitialized. If the watermark indicator 904 is stored in a nonvolatilememory structure (including, for example, a reserved area of the volumebeing initialized), the initialization process may be paused and resumedin response to a power outage, a normal shutdown procedure, or otherinterruption of the initialization process. Stripes of the volume may begrouped into “zones” 902 as shown in FIG. 9. As shown FIG. 9, watermarkindicator 904 indicates progress of background initialization throughsome stripe approximately in the middle of the fifth defined zone of thelogical unit or volume being initialized.

[0044] A data structure may be provided as a second progress indicatoror second progress information indicating zones in which I/O requestshave been processed substantially in parallel with the backgroundinitialization task. In one exemplary embodiment, zone bitmap structure906 may be defined having one storage bit associated with each definedzone of the volume being initialized. Such a zone bitmap structure 906may be stored in nonvolatile memory associated with the storagecontroller or may be stored in a reserved portion of the volume beinginitialized or other nonvolatile memory components.

[0045] The bitmap representation of such a structure is a compact,rapidly accessible structure for determining whether any I/O requestoperation was performed in the associated zone. If such a request isperformed, the associated zone's bitmap structure bit may be set. Whereno I/O request has been processed in the corresponding zone during thebackground initialization process, the associated zone bitmap bit may bereset.

[0046] The number of the zones defined within the logical unit beinginitialized, and hence the number of bits in the zone bitmap structure906 may be determined largely in accordance with capacity of thenonvolatile memory component associated with the storage controller.Often, a nonvolatile memory component within a storage controller iscostly and hence sparingly utilized. Where, for example, only 64 bits or128 bits of nonvolatile memory may be allocated for the zone bitmapstructure, the number of zones 902 within the volume being initializedwould also be limited correspondingly to 64 or 128. Each zone 902therefore may include a significant number of stripes within the totalcapacity of the volume being initialized. Zone bitmap structure 906therefore includes a single bits indicating whether any of the pluralityof stripes within the corresponding zone have been modified by parallelprocessing of an I/O request.

[0047] The background initialization process may determine how to mostrapidly initialize stripes in each zone by referring to the zone bitmapstructure 906. Where the zone bitmap bit corresponding to the zoneincluding the next stripe indicates that no stripes have been modifiedby parallel processing of I/O request, the initialization process mayapplying a rapid initialization technique to initialize each stripe inthat zone. However, where the zone bitmap structure for a particularzone indicates some stripes have been modified, the initializationprocess may need to more carefully analyze whether the rapidinitialization may be utilized. To permit the initialization process todetermine with finer granularity which stripes of a particular zone havebeen modified, sub-zones may be defined within each zone. A sub-zone maybe any portion of the stripes within the corresponding zone. Sub-zonedata structure 908 may be implemented as a bitmap structure where thenumber of bits corresponds to the number of sub-zones within thecorresponding zone.

[0048] Because the sub-zone bitmap structure 908 may be substantiallylarger than the zone bitmap structure 906, the sub-zone bitmap structure908 may preferably be stored in less costly, more abundant DRAM or othernonvolatile memory associated with the storage controller. Where thestorage controller memory capacity so permits, sub-zones may be definedto the level of granularity of a particular stripe. Thus, each bit of asub-zone bitmap structure 908 corresponds to one stripe of thecorresponding zone. Where memory capacity for storing the sub-zonebitmap structure 908 is more limited, the sub-zone may representmultiple stripes but still a smaller group than that of the zone.Aspects and features of the present invention may then utilize the zonebitmap structure 906 and sub-zone bitmap structure 908 in combination todetermine more precisely which stripes of a zone have been modified byparallel processing of an I/O request. In particular, backgroundinitialization processing may inspect a zone bitmap structure todetermine if any stripes in a corresponding zone have been modified byan I/O request. If the zone bitmap element indicates that some stripehas been modified, the corresponding sub-zone structure may be inspectedto determine exactly which stripes have been modified.

[0049] Use of the exemplary data structures of FIG. 9 will be furtherdefined with reference to methods discussed herein. The data structuresfor generation and maintenance of first and second progress informationdepicted in FIG. 9 are intended as merely representative of one of anumber of equivalent structures readily apparent to those of ordinaryskill in the art. Further, those of ordinary skill in the art willrecognize that the structures may be stored in any appropriate memorymedium associated with the storage controller. Preferably some or all ofthe various progress information structures may be stored in anonvolatile memory so that initialization may proceed where interruptedfollowing, for example, interruption by power failure or other resetlogic.

[0050]FIGS. 5 and 6 are flowcharts describing operation of an exemplaryI/O request processing method and an exemplary background initializationmethod, respectively. In particular the process of FIG. 5 is operable inresponse to receipt of an I/O write request from an attached hostsystem. Element 500 first determines if the affected logical unit(volume) has already completed initialization (i.e., is in aninitialized state as discussed above). If so, element 502 is operable toperform normal I/O request processing using read-modify-write (RMW) orother I/O request processing techniques for standard I/O write requestprocessing. If element 500 determines that the affected volume has notyet been fully initialized, element 504 is next operable to inspect thethreshold or watermark progress information discussed above to determinewhether the initialization process has progress sufficiently toinitialize the area affected by the present I/O request. If so, element502 is operable as above to complete the I/O request processing usingRMW or other I/O write processing techniques. If element 504 determinesthat the affected portion of the volume has not yet been initialized,element 506 is operable to perform the desired I/O write request usingread-peer-write processing as discussed further below. Read-peer-writeprocessing, though slower than standard RMW processing, ensures that theaffected stripe will be made XOR consistent by virtue of the I/O requestprocessing. Element 508 is then operable to update the progressinformation to indicate the stripes affected by processing of the I/Owrite request. As noted above, element 508 may update zone and sub-zonebitmap structures indicating the stripes affected by the processing ofthe I/O request.

[0051] As discussed above, I/O request processing may use RMW processingtechniques or read-peer-write processing techniques. FIG. 2 is aflowchart describing an exemplary read-modify-write (RMW) operation. Ingeneral, an RMW operation is used for writing information in a RAIDstorage subsystem where redundancy information must be updated inresponse to the updating of a portion of associated stripe data. Forexample, in a RAID level 5 or RAID level 6 storage volume, a stripeconsists of a number of blocks or portions of data, each on one of theplurality of disk drives associated with the volume. In addition, thestripe includes redundancy information associated with the data portionof the stripe. Where an I/O request writes to a portion of the data of astripe in a volume, RMW processing may be invoked to rapidly update theaffected data and redundancy portions of the stripe.

[0052] Element 200 of FIG. 2 is first operable to read the present dataabout to be updated (D1′) and the present parity portion of the stripe(P1′). Element 202 then writes the new data portion (D1) of the stripeon the appropriate destination drive or drives of the storage system.The I/O request may update only a portion of the stripe represented byD1. That portion may span one or more disk drives of the volume. Element204 then computes a new parity portion of the stripe as the XOR sum ofthe modified data portion (D1), the previous parity information (P1′)and the previous data information (D1′). Lastly, element 206 is operableto write the newly computed parity information to the appropriatedestination drives of the storage system.

[0053] By contrast, FIG. 3 describes a method for read-peer-write (RPW)processing of an I/O request. Where they parity information of anaffected stripe is not presently known to be XOR consistent, an RPWoperation will ensure that the stripe becomes XOR consistent by virtueof the write operations. Element 300 first reads all unaffected dataportions of the affected stripe (Dm . . . Dn). Element 302 then writesthe new data portion of the affected stripe to appropriate destinationdrives. Affected portions of the stripe are represented as N1 . . . Nm.Element 304 is then operable to compute parity for the affected stripeas the XOR sum of the new data portions (N1 . . . Nm) and unaffecteddata portions (Dm . . . Dn). Lastly, element 306 is operable to writethe newly computed parity to an appropriate destination drive for theaffected stripe. Where the new data (N1 . . . Nm) represents theentirety of the stripe, hence no additional portions of data need beread by operation of element 302, the RPW operation is then oftenreferred to as a full stripe write operation.

[0054] The RMW processing of FIG. 2 and the RPW processing of FIG. 3 maybe readily adapted by those of ordinary skill in the art to processmultiple stripes where some of the multiple stripes are processed as“full stripe” write operations and other partial stripe modificationsmay be processed either has RMW or RPW operations.

[0055]FIG. 6 is a flowchart of an exemplary background initializationprocess associated with aspects of the present invention to providerapid, background initialization of a volume substantially in parallelwith I/O request processing as discussed above with respect to FIG. 5.Once initiated, element 600 (of FIG. 6) first determines whether morestripes of the volume remained to be initialized (made XOR consistent).If not, initialization of the volume completes with element 612 markingthe logical unit (volume) as now in the initialized state. As notedabove, the initialized state allows I/O request processing to proceednormally. If element 600 determines that additional stripes remain to beinitialized in the logical unit, element 602 is next operable todetermine from the progress information whether the next stripe to beinitialized has already been written by processing of an I/O request. Inparticular, element 602 may inspect the zone and sub-zone bitmapstructures to determine whether this particular stripe has beenpreviously written by an I/O request processed in parallel with thebackground initialization task. If element 604 determines that thestripe has been written, element 608 is next operable to use amake-consistent process to ensure initialization of the stripe (asdiscussed further below).

[0056] Where the zone and sub-zone progress information as discussedabove provides sufficient granularity to know precisely which stripeshave been written, processing of element 608 is unnecessary in that theRPW processing performed by the earlier I/O request has already ensuredXOR consistency of the associated stripe. However, where the granularityof the zone and sub-zone information does not specify individual stripesbut rather groups of stripes, element 608 may be performed to ensure XORconsistency of the stripe.

[0057] Where element 604 determines that they next stripe has not beenwritten by a previous I/O request, element 606 is next operable to use awrite-same process as discussed further herein below to ensure XORconsistency of the next stripe. In both cases, following operation ofelement 606 and 608, element 610 is operable to update progressinformation to indicate further progress of the backgroundinitialization process. In particular, element 610 may update thethreshold or watermark indicator discussed above to indicate furtherprogress in the initialization process. Processing then continues bylooping back to element 600 to continue initializing further stripes ofthe volume.

[0058]FIG. 7 is a flowchart showing details of an exemplarymake-consistent process as discussed above with respect to element 608.The make-consistent process is essentially identical to the RPWprocessing discussed above with respect to FIG. 3 except that no newdata is provided. Rather, element 700 reads all data portions of thestripe (D1 . . . Dn). Element 702 then computes a new parity portion ofthe stripe as the XOR sum of all data portions (D1 . . . Dn). Element704 is then operable to write the newly computed parity information tothe appropriate destination drive of the stripe.

[0059]FIG. 8 is a flowchart describing and exemplary process forimplementing the write-same procedure for fast initialization of astripe. In particular, element 800 is operable to write all dataportions and the associated parity portion of the stripe to zeros. Anall zero stripe is XOR consistent as regards typical XOR parity of RAIDlevel 5. For other RAID level systems, similar fast write operations maybe utilized to effectuate rapid initialization of an associated stripe.In general, element 800 is operable to perform the fastest operationpossible to make a given stripe XOR consistent where the data portiondoes not presently have relevant data to be retained.

[0060] Further aspects of the present invention allow for improvedutilization of costly nonvolatile memory as compared to less costly,more abundant DRAM devices. In one exemplary embodiment, the zone bitmapstructure may be stored in nonvolatile memory. Typically suchnonvolatile memory is slower to read and write as compared to lesscostly DRAM components. Therefore, in one aspect of the invention, thezone bitmap structure is written both to nonvolatile memory andsubstantially simultaneously written to a shadow copy in DRAM. Thenonvolatile version may be used to recover from a power reset or otherinterruption of initialization process while the DRAM version may beused as a shadow copy for more rapid read access to the zone bitmap datastructure.

[0061] All progress information, including the watermark indicator, zonebitmap structure and sub-zone bitmap structure, may be persistentlystored in a nonvolatile memory to allow orderly shutdown of the storagesystem during an initialization process and orderly resumption of theinterrupted initialization process. Upon resumption of theinitialization process, all progress information may be restored fromthe nonvolatile memory into appropriate locations of volatile and othernonvolatile memory to continue the initialization process whereinterrupted.

[0062] Further aspects of the present invention to allow dynamicallocation and compacting (“garbage collection”) for the elements ofprogress information stored in DRAM or other memory. As backgroundinitialization proceeds from beginning to end of the new volume, zoneand sub-zone bitmap structures for portions of the volume alreadyinitialized may be released, compacted and reallocated for the remainingvolume initialization. Such compaction and reallocation may permit finergranularity in the zone and sub-zone definition as the initializationproceeds. Such compaction (“garbage collection”) programming techniquesmay be implemented as appropriate to the particular data structureschosen for implementation of the progress information. Well-knownprogramming paradigms for such dynamic allocation and control of memoryare generally known to those of ordinary skill in the art.

[0063] While the invention has been illustrated and described in thedrawings and foregoing description, such illustration and description isto be considered as exemplary and not restrictive in character. One ormore exemplary embodiments of the invention and minor variants thereofhave been shown and described. Protection is desired for all changes andmodifications that come within the spirit of the invention. Thoseskilled in the art will appreciate variations of the above-describedembodiments that fall within the scope of the invention. As a result,the invention is not limited to the specific examples and illustrationsdiscussed above, but only by the following claims and their equivalents.

What is claimed is:
 1. A method for initializing a storage volumecomprising: making the volume XOR consistent; and processing an I/Orequest received from an attached host system substantially in parallelwith the step of making the volume XOR consistent.
 2. The method ofclaim 1 further comprising: storing first progress information regardingthe step of making the volume XOR consistent in a non-volatile memory.3. The method of claim 2 wherein the step of processing furthercomprises: determining if the I/O request affects only portions of thevolume already made XOR consistent; performing the I/O request usingread-modify-write processing if the I/O request affects only portions ofthe volume already made XOR consistent; and performing the I/O requestusing read-peer-write processing if the I/O request affects any portionof the volume not already made XOR consistent.
 4. The method of claim 3wherein the step of performing read-peer-write processing furthercomprises: storing second progress information in a memory indicatingportions of the volume made XOR consistent by the read-peer-writeprocessing.
 5. The method of claim 4 wherein the step of making furthercomprises: determining from the first progress information and from thesecond progress information whether a portion of the volume to beinitialized has been modified by read-peer-write processing; making theportion XOR consistent using write-same logic to clear the portion ifthe portion has not been modified by read-peer-write processing; makingthe portion XOR consistent using write-parity logic to compute parity ifthe portion has been modified by read-peer-write processing; andupdating the first progress indicator to indicate completion of makingthe portion XOR consistent.
 6. The method of claim 4 further comprising:dividing the volume into a plurality of zones, wherein the step ofstoring first progress information further comprises: storing thresholdindicia indicative of the progress of the step of making the volume XORconsistent wherein zones below the threshold indicia have been made XORconsistent and wherein zones above the threshold indicia are not yet XORconsistent, and wherein the step of storing second progress informationfurther comprises: storing a zone bitmap structure in the non-volatilememory wherein each bit of the zone bitmap structure corresponds to azone of the plurality of zones and wherein each bit of the zone bitmapstructure indicates whether processing of an I/O request has modifiedany portion of the corresponding zone of the volume.
 7. The method ofclaim 6 further comprising: dividing each zone into a plurality ofsub-zones, wherein the step of storing second progress informationfurther comprises: storing a sub-zone bitmap structure in a memorywherein each bit of the sub-zone bitmap structure corresponds to asub-zone of the plurality of sub-zones and wherein each bit of thesub-zone bitmap structure indicates whether processing of an I/O requesthas modified any portion of the corresponding sub-zone of the volume. 8.The method of claim 7 wherein the step of making further comprises:determining from the first progress information and from the secondprogress information whether a portion of the volume to be initializedhas been modified by read-peer-write processing; making the portion XORconsistent using write-same logic to clear the portion if the portionhas not been modified by read-peer-write processing; making the portionXOR consistent using write-parity logic to compute parity if the portionhas been modified by read-peer-write processing; and updating the firstprogress indicator to indicate completion of making the portion XORconsistent.
 9. The method of claim 7 further comprising: pausing thestep of making the volume XOR consistent; saving the first progressinformation and the second progress information on disk drives of thevolume; restoring the saved first progress information and the savedsecond progress information from disk drives of the volume; and resumingthe step of making the volume XOR consistent in accordance with therestored first progress information and the restored second progressinformation.
 10. A storage system comprising: a plurality of diskdrives; and a storage controller coupled to the plurality of disk drivesand adapted to receive and process I/O requests from an attached hostsystem, wherein the storage controller is adapted to define a volumecomprising a portion of each of one or more disk drives of the pluralityof disk drives, and wherein the storage controller is adapted to makethe volume XOR consistent substantially in parallel with processing ofI/O requests from the attached host system.
 11. The storage system ofclaim 10 further comprising: a progress indicator indicating progress inoperation of the storage controller to make the volume XOR consistent,wherein the storage controller is further operable to perform the I/Orequest using read-modify-write processing if the I/O request affectsonly portions of the volume already made XOR consistent as indicated bythe progress indicator, and wherein the storage controller is furtheroperable to perform the I/O request using read-peer-write processing ifthe I/O request affects any portion of the volume not already made XORconsistent as indicated by the progress indicator.
 12. The storagesystem of claim 11 further comprising: a non-volatile memory coupled tothe storage controller for storing the progress indicator.
 13. Thestorage system of claim 10 further comprising: a first progressindicator stored in a non-volatile memory indicating progress inoperation of the storage controller to make the volume XOR consistent,wherein the storage controller is further operable to perform the I/Orequest using read-modify-write processing if the I/O request affectsonly portions of the volume already made XOR consistent as indicated bythe progress indicator, and wherein the storage controller is furtheroperable to perform the I/O request using read-peer-write processing ifthe I/O request affects any portion of the volume not already made XORconsistent as indicated by the progress indicator; and a second progressindicator indicating portions of the volume made XOR consistent by theread-peer-write processing.
 14. The storage system of claim 13 whereinthe first progress indicator comprises: a threshold indicatorperiodically updated to reflect progress in making the volume XORconsistent, wherein the second progress indicator comprises a bitmapstructure comprising a plurality of bits wherein each bit corresponds toa zone of a plurality of zones of the volume and wherein each bitindicates whether any potion of the corresponding zone has been made XORconsistent by operation of a read-peer-write operation.
 15. The storagesystem of claim 14 wherein the storage controller is further operable todetermine from the first progress indicator and from the second progressindicator whether a portion of the volume to be initialized has beenmodified by read-peer-write processing, wherein the storage controlleris further operable to make the portion XOR consistent using write-samelogic to clear the portion if the portion has not been modified byread-peer-write processing, wherein the storage controller is furtheroperable to make the portion XOR consistent using write-parity logic tocompute parity if the portion has been modified by read-peer-writeprocessing.
 16. A system for initializing a storage volume comprising:means for making the volume XOR consistent; and means for processing anI/O request received from an attached host system substantially inparallel with the step of making the volume XOR consistent.
 17. Thesystem of claim 16 further comprising: non-volatile storage means forstoring first progress information regarding progress in making thevolume XOR consistent.
 18. The system of claim 17 wherein the means forprocessing further comprises: means for determining if the I/O requestaffects only portions of the volume already made XOR consistent; meansfor performing the I/O request using read-modify-write processing if theI/O request affects only portions of the volume already made XORconsistent; and means for performing the I/O request usingread-peer-write processing if the I/O request affects any portion of thevolume not already made XOR consistent.
 19. The system of claim 18wherein the means for performing read-peer-write processing furthercomprises: storage means for storing second progress informationindicating portions of the volume made XOR consistent by theread-peer-write processing.
 20. The system of claim 19 wherein the meansfor making further comprises: means for determining from the firstprogress information and from the second progress information whether aportion of the volume to be initialized has been modified byread-peer-write processing; means for making the portion XOR consistentusing write-same logic to clear the portion if the portion has not beenmodified by read-peer-write processing; means for making the portion XORconsistent using write-parity logic to compute parity if the portion hasbeen modified by read-peer-write processing; and means for updating thefirst progress indicator to indicate completion of making the portionXOR consistent.
 21. The system of claim 19 further comprising: means fordividing the volume into a plurality of zones, wherein the firstprogress information further comprises: a threshold indicia indicativeof the progress of the means for making the volume XOR consistentwherein zones below the threshold indicia have been made XOR consistentand wherein zones above the threshold indicia are not yet XORconsistent, and wherein the second progress information furthercomprises: a zone bitmap structure in the non-volatile storage meanswherein each bit of the zone bitmap structure corresponds to a zone ofthe plurality of zones and wherein each bit of the zone bitmap structureindicates whether processing of an I/O request has modified any portionof the corresponding zone of the volume.
 22. The system of claim 21further comprising: means for dividing each zone into a plurality ofsub-zones, wherein the second progress information further comprises: asub-zone bitmap structure wherein each bit of the sub-zone bitmapstructure corresponds to a sub-zone of the plurality of sub-zones andwherein each bit of the sub-zone bitmap structure indicates whetherprocessing of an I/O request has modified any portion of thecorresponding sub-zone of the volume.
 23. The system of claim 22 whereinthe means for making further comprises: means for determining from thefirst progress information and from the second progress informationwhether a portion of the volume to be initialized has been modified byread-peer-write processing; means for making the portion XOR consistentusing write-same logic to clear the portion if the portion has not beenmodified by read-peer-write processing; means for making the portion XORconsistent using write-parity logic to compute parity if the portion hasbeen modified by read-peer-write processing; and means for updating thefirst progress indicator to indicate completion of making the portionXOR consistent.
 24. The system of claim 22 further comprising: means forpausing operation of the means for making the volume XOR consistent;means for saving the first progress information and the second progressinformation on disk drives of the volume; means for restoring the savedfirst progress information and the saved second progress informationfrom disk drives of the volume; and means for resuming operation of themeans for making the volume XOR consistent in accordance with therestored first progress information and the restored second progressinformation.